Synonym Discovery for Structured Entities on Heterogeneous Graphs
نویسندگان
چکیده
With the increasing use of entities in serving people’s daily information needs, recognizing synonyms—different ways people refer to the same entity—has become a crucial task for many entity–leveraging applications. Previous works often take a “literal” view of the entity, i.e., its string name. In this work, we propose adopting a “structured” view of each entity by considering not only its string name, but also other important structured attributes. Unlike existing query logbased methods, we delve deeper to explore sub-queries, and exploit tailed synonyms and tailed web pages for harvesting more synonyms. A general, heterogeneous graph-based data model which encodes our problem insights is designed by capturing three key concepts (synonym candidate, web page and keyword) and different types of interactions between them. We cast the synonym discovery problem into a graph-based ranking problem and demonstrate the existence of a closed-form optimal solution for outputting entity synonym scores. Experiments on several real-life domains demonstrate the effectiveness of our proposed method.
منابع مشابه
Hyperset approach to semi-structured databases and the experimental implementation of the query language Delta
This thesis presents practical suggestions towards the implementation of the hyperset approach to semi-structured databases and the associated query language ∆ (Delta). This work can be characterised as part of a top-down approach to semi-structured databases, from theory to practice. Over the last decade the rise of the World-Wide Web has lead to the suggestion for a shift from structured rela...
متن کاملEntity Type Recognition for Heterogeneous Semantic Graphs
We describe an approach to reducing the computational cost of identifying coreferent instances in heterogeneous semantic graphs where the underlying ontologies may not be informative or even known. The problem is similar to coreference resolution in unstructured text, where a variety of linguistic clues and contextual information is used to infer entity types and predict coreference. Semantic g...
متن کاملAutomatic Discovery of Similar Words
We deal with the issue of automatic discovery of similar words (synonyms and near-synonyms) from different kind of sources: from large corpora of documents, from the Web, and from monolingual dictionaries. We present in detail three algorithms that extract similar words from a large corpus of documents and consider the specific case of the World Wide Web. We then describe a recent method of aut...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملBehavior Query Discovery in System-Generated Temporal Graphs
Computer system monitoring generates huge amounts of logs that record the interaction of system entities. How to query such data to better understand system behaviors and identify potential system risks and malicious behaviors becomes a challenging task for system administrators due to the dynamics and heterogeneity of the data. System monitoring data are essentially heterogeneous temporal grap...
متن کامل